Distribution of "Characteristic" Terms in MEDLINE Literatures
نویسندگان
چکیده
Given the occurrence frequency of any term within any set of articles within MEDLINE, we define ―characteristic‖ terms as words and phrases that occur in that literature more frequently than expected by chance (at p < 0.001 or better). In this report, we studied how the cut-off criterion varied as a function of literature size and term frequency in MEDLINE as a whole, and have compared the distribution of characteristic terms within a number of journal-defined, affiliation-defined and random literatures. We also investigated how the characteristic terms were distributed among MEDLINE titles, abstracts, and last sentence of abstracts, including ―regularized‖ terms that appear both in the title and abstract of the same paper for at least one paper in the literature. For a set of 10 disciplinary journals, the characteristic terms comprised 18% of the total terms on average. Characteristic terms are utilized in several of our web-based services (Anne O’Tate and Arrowsmith), and should be useful for a variety of other information-processing tasks designed to improve text mining in MEDLINE.
منابع مشابه
Normalized Medline Distance and Its Utilization in Context-aware Life Science Literature Search
When facing great volume of query results while users are searching literatures on the Web, we propose to refine the search process by using user interests. We analyze user interests and calculate semantic similarity among those interest terms to fulfill query refinement. Traditional general purpose similarity measures may not always fit a domain specific context. In this paper, under the conte...
متن کاملProviding A New Characteristic for Overcurrent Relays
This with the integration of distributed generation (DG) to meshed distribution systems, the operating time of the protective system becomes a major concern in order to avoid nuisance DG tripping. This paper proposes a new tripping characteristic for directional overcurrent relays (DOCRs) that can achieve a higher possible reduction of overall relays operating time in meshed distribution networ...
متن کاملA New Goodness-of-Fit Test for a Distribution by the Empirical Characteristic Function
Extended Abstract. Suppose n i.i.d. observations, X1, …, Xn, are available from the unknown distribution F(.), goodness-of-fit tests refer to tests such as H0 : F(x) = F0(x) against H1 : F(x) $neq$ F0(x). Some nonparametric tests such as the Kolmogorov--Smirnov test, the Cramer-Von Mises test, the Anderson-Darling test and the Watson test have been suggested by comparing empirical ...
متن کاملProject Final Report
Vast amount of literatures for biomedical research is available online, in MEDLINE database. This helps the biomedical scientists to have instant access to literatures and references they need. But finding a manageable subset of literatures that are relevant to their current research is hard because: (1) the number of these articles are growing very fast , and (2) each disease (and gene) has di...
متن کاملتخمین منحنی مشخصه آب خاک با استفاده از منحنی دانهبندی و چگالی ظاهری خاک
To obtain soil-moisture characteristic curve experimentally is time-consuming and usually subject to considerable errors. So, many investigators have tried to predict soil-moisture characteristic curve by different models. One of these models predicts soil moisture characteristic curve based on soil particle size distribution and bulk density. In this model, soil particle size distribution curv...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Information
دوره 2 شماره
صفحات -
تاریخ انتشار 2011